Speaker recognition via fusion of subglottal features and MFCCs

نویسندگان

Harish Arsikere

Hitesh Anand Gupta

Abeer Alwan

چکیده

Motivated by the speaker-specificity and stationarity of subglottal acoustics, this paper investigates the utility of subglottal cepstral coefficients (SGCCs) for speaker identification (SID) and verification (SV). SGCCs can be computed using accelerometer recordings of subglottal acoustics, but such an approach is infeasible in real-world scenarios. To estimate SGCCs from speech signals, we adopt the Bayesian minimum mean squared error (MMSE) estimator proposed in the speech-to-articulatory inversion literature. The joint distribution of SGCCs and speech MFCCs is modeled using the WashU-UCLA corpus (containing simultaneous recordings of speech and subglottal acoustics), and the resulting model is used to obtain an MMSE estimate of SGCCs from unseen (test) MFCCs. Cross-validation experiments on the WashU-UCLA corpus show that the estimation efficacy, on average, is speaker dependent. A score-level fusion of MFCC and SGCC systems outperforms the MFCC-only baseline in both SID and SV tasks. On the TIMIT database (SID), the relative reduction in identification error is 16, 40 and 51% for G.712-filtered (300–3400 Hz), narrowband (0–4000 Hz) and wideband (0–8000 Hz) speech, respectively. On the NIST 2008 database (SV), the relative reduction in equal error rate is 4 and 11% for 10 and 5 second utterances, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition

The recently introduced mean Hilbert envelope coefficients (MHEC) have been shown to be an effective alternative to MFCCs for robust speaker identification under noisy and reverberant conditions in relatively small tasks. In this study, we investigate the effectiveness of these acoustic features in the context of a state-of-the-art speaker recognition system. The i-vectors are used to represent...

متن کامل

Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features

Speaker verification in real-world applications sometimes deals with limited duration of enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art for speaker verification, but it is well known that they are less effective with short utterances. To address this issue, we propose a method to leverage the speaker specificity and stationarity of subglottal acoustic...

متن کامل

Speaker verification based on fusion of acoustic and articulatory information

We propose a practical, feature-level fusion approach for speaker verification using information from both acoustic and articulatory signals. We find that concatenating articulation features obtained from actual speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance. However, since access to actual speech produc...

متن کامل

Invariant integration features combined with speaker-adaptation methods

Speaker-normalization and -adaptation methods are essential components of state-of-the-art speech recognition systems nowadays. Recently, so-called invariant integration features were presented which are motivated by the theory of invariants. While it was shown that the integration features outperform MFCCs when used with a basic monophone recognition system, it was left open, if their benefits...

متن کامل

Robust speaker identification via fusion of subglottal resonances and cepstral features.

This letter investigates the use of subglottal resonances (SGRs) for noise-robust speaker identification (SID). It is motivated by the speaker specificity and stationarity of subglottal acoustics, and the development of noise-robust SGR estimation algorithms which are reliable at low signal-to-noise ratios for large datasets. A two-stage framework is proposed which combines the SGRs with differ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Speaker recognition via fusion of subglottal features and MFCCs

نویسندگان

چکیده

منابع مشابه

Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition

Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features

Speaker verification based on fusion of acoustic and articulatory information

Invariant integration features combined with speaker-adaptation methods

Robust speaker identification via fusion of subglottal resonances and cepstral features.

عنوان ژورنال:

اشتراک گذاری